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Abstract 

A sequence of real numbers is Benford if the significands, i.e. the fraction 
parts in the floating-point representation of (x„), are distributed logarithmically. 
Similarly, a discrete-time irreducible and aperiodic finite-state Markov chain with 
probability transition matrix P and limiting matrix P* is Benford if every com- 
ponent of both sequences of matrices (P" — P*) and — P") is Benford or 
eventually zero. Using recent tools that established Benford behavior both for 
Newton's method and for finite-dimensional linear maps, via the classical theo- 
ries of uniform distribution modulo 1 and Perron-Frobenius, this paper derives a 
simple sufficient condition ( "nonresonance" ) guaranteeing that P, or the Markov 
chain associated with it, is Benford. This result in turn is used to show that 
almost all Markov chains are Benford, in the sense that if the transition prob- 
abilities are chosen independently and continuously, then the resulting Markov 
chain is Benford with probability one. Concrete examples illustrate the various 
cases that arise, and the theory is complemented with several simulations and 
potential applications. 
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1 Introduction 



Benford's Law (BL) is the widely-known logarithmic probability distribution on sig- 
nificant digits (or equivalently, on significands), and its most familiar form is the 
special case of first significant digits (base 10), namely, 

P(L>i = (ii) = logio(l + j^) , Vdi G {1,2,...,9}, (1) 

where for each x E M"^, the number Di{x) is the first significant digit (base 10) 
of X, i.e. the unique integer d G {1,2, ... ,9} satisfying lO'^d < x < I0^{d + 1) for 
some, necessarily unique, k €z Z. Thus, for example, -Di (30122) = 1)1(0.030122) = 
£>! (3.0122) = 3, and 1^ implies that 

P(Di = 1) = logiQ 2 ^ 0.301 , P(L>i = 2) = logio(3/2) ^ 0.176, etc., 

see also Table [U below. 

In a form more complete than ([T|), BL is a statement about joint distributions of 
the first n significant digits (base 10) for any n G N, namely, 

F{{Di,D2,D3,...,Dn) = {dud2,d3,...,dn)) 

= logio (E ■=! 10"''^^- + l) - l°gio (E ■=! 10""'^^) (2) 

where di £ {1,2,..., 9} and dj G {0, 1, 2, ... , 9} for j > 2, and D2, D3, etc. repre- 
sent the second, third, etc. significant digit functions (base 10). Thus, for example, 
L>2 (30122) = L>2 (0.030122) = L>2 (3.0122) = 0, and a special case of is 

¥{{Di,D2,Ds) = (3, 0, 1)) = logio 302 - log^o 301 = logio (l + ^) = 0.00144 . 

Formally, for every n G N, n > 2, the number Dn{x), the n-th significant digit (base 
10) of X G M+, is defined inductively as the unique integer d G {0, 1, 2, . . . , 9} such 
that 

10'^ (^d + Y.'^jll '^0"-Wj{x)^ <x<lO^ (^d+l + Y.'^jll 10""^^i(^)) 
for some (unique) k £ Z. 
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The formal probability framework for the significant-digit law is described in [121 
I13j . The sample space is the set of positive reals, and the cr-algebra of events is the 
cT-algebra generated by the (decimal) significand (or mantissa) function S : — t- 
[1,10), where S{x) is the unique number s G [1,10) such that x = lO^s for some 
/c G Z. Equivalently, the significand events are the sets in the o"- algebra generated 
by the significant digit functions Di, D2, D3, etc. The probability measure on this 
sample space associated with BL is 

P(5 < t) = logio^, ViG[l,10). 

It is easy to see that the significant digit functions Di and 02,0^, etc. are well- 
defined {1,2,..., 9}- and {0, 1,2,..., 9}-valued random variables, respectively, on this 
probability space with probability mass functions as given in ([T|) and ([2]). 

Note. Throughout this article, all results are restricted to decimal (base 10) signifi- 
cant digits, and accordingly log always denotes the base 10 logarithm. For notational 
convenience, -Dn(O) := for all n G N. The results carry over easily to arbitrary bases 
& G N \ {1}, as is evident from p], where the essential difference is replacing log^^o by 
log^, and the decimal significant digits by the base-6 significant digits. 

Benford's Law is now known to hold in great generality, e.g. for classical combinatorial 
sequences such as (2"), (n!) and the Fibonacci numbers {Fn); iterations of linearly- or 
nonlinear ly-dominated functions; solutions of ordinary differential equations; products 
of independent random variables; random mixtures of data; and random maps (e.g., 
see [H m m El [13] ) . Table [U compares the empirical frequencies of Di for the first 
1000 terms of the sequences (2"), (n!) and {Fn). These empirical frequencies illustrate 
what it means to follow BL and also foreshadow the simulations in Section [5l 

The main contribution of this article is to adapt recent results on BL in the 
multi-dimensional setting ([2j) in order to establish BL in finite-dimensional, time- 
homogeneous Markov chains, and to suggest several applications including error anal- 
ysis in numerical simulations of n-step transition matrices. 

Concretely, given the transition matrix P of a finite-state Markov chain (i.e., P is a 
row-stochastic matrix), a common problem is to estimate the limit P* = lim„_>oo -P"- 
The two main theoretical results below. Theorems |A] and [HI respectively, show that 
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(2") 


(F„) 


Bc'iiford 


1 


0.293 


0.292 


0.301 


0.30103 


2 


0.176 


0.180 


0.176 


0.17609 


3 


0.124 


0.126 


0.126 


0.12493 


4 


0.102 


0.098 


0.096 


0.09691 


5 


0.087 


0.081 


0.079 


0.07918 


6 


0.069 


0.068 


0.067 


0.06694 


7 


0.051 


0.057 


0.057 


0.05799 


8 


0.051 


0.053 


0.053 


0.05115 


9 


0.047 


0.045 


0.045 


0.04575 



Table 1: Empirical frequencies of Di for the first 1000 terms of the sequences (2"), 
(n!) and the Fibonacci numbers {F„), as compared with the Benford probabilities. 

under a natural condition ( "nonresonance" ) every component of the sequence of ma- 
trices (P" — P*) and (P"'+^ — P") obeys BL, and that this behavior is typical, i.e., 
it occurs for almost all Markov chains. Simulations are provided for illustration, fol- 
lowed by several potential applications including the estimation of roundoff errors 
incurred when estimating P* from P", and possible (partial negative) statistical tests 
to decide whether data comes from a finite-state Markov process. 

2 Benford Markov chains and main tools 

The set of natural, integer, rational, positive real, real and complex numbers are 
symbolized by N, Z,Q,R+,M and C, respectively. The real part, imaginary part, 
complex conjugate and absolute value (modulus) of a number z G C is denoted by 
^KtZjJmz, z and \z\, respectively. For z ^ 0, the argument arg^; is the unique number 
in (— 7r,7r] that satisfies z = \z\e^^^^. For ease of notation, argO := and logO := 0. 
The cardinality of the finite set A is #A. Throughout this article, the sequence 
(a(l), a(2), a(3), . . .) is denoted by (a(n)). Thus, for example, (a") = (a-*^, a^, a'^, . . .) 
and (P"+^ - P") = (p2 - P\ p3 - p2 p4 _ p3 ;) Boldface symbols indicate 
randomized quantities, e.g. X denotes a random variable or vector and P a random 
transition probability matrix. 
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Definition 2.1. A sequence (xn) of real numbers is Benford {^^follows BU^) if 

lim„_^il^^^l^^^:^i^=logt, WE [1,10). 
n 

The main subject of this paper is the Benford behavior of finite-state Markov chains. 
The theory uses three main tools: the classical theory of uniform distribution modulo 
1, see e.g. [16]; recent results for BL in one- and multi-dimensional dynamical sys- 
tems ([H [2]); and the classical Perron- Frobenius theory for Markov chains, see e.g. 
[6l|T9]. The first lemma records the relationship between uniform distribution theory 
and BL, and the second lemma is an application establishing BL for certain basic se- 
quences that will be used repeatedly below. Here and throughout, the term uniformly 
distributed modulo 1 is abbreviated as u.d. mod 1. 

Lemma 2.2 ([8j). A sequence of real numbers is Benford if and only i/(log|x„|) 
is u.d. mod 1. 

An immediate application of Lemma 12.21 is the following useful lemma. 

Lemma 2.3 ([T^). Let (x„) be Benford. Then for a// a £ M and k £ Z with ak ^ 0, 
the sequence (axj^) is also Benford. 

Lemmas 12.21 and 12.31 are fundamental tools for analyzing BL in the setting of 
multi-dimensional dynamical systems ([2]), and although those results do not apply 
directly to the Markov chain setting, the first part of the theory established below 
relies heavily on those ideas specialized to the case of row-stochastic matrices. 

The next lemma follows easily from known results. It is included here since these 
observations play a central role in determining whether a Markov chain is Benford, 
as illustrated in the three examples following the lemma. Stronger conclusions are 
possible, as suggested in Example I2.5r iii) below, but are not needed here. 

Lemma 2.4. Let a,b,a, f3 be real numbers with a ^ and\a\ > \f3\. Then {aa"^ +bl3^) 
is Benford if and only if log \ a\ is irrational. 

Proof. Since \a\ > the significands of a" dominate those of /3" asymptotically, so 
the conclusion follows from Lemma [2.21 Lemma [2.31 and Weyl's classical theorem that 
iterations of an irrational rotation on the circle are uniformly distributed. □ 
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Example 2.5. 



(i) The sequences (2"), (0.2"), (3"), (0.3") are Benford, whereas (10"), (0.1"), 
(^y/TO^^ are not Benford. 

(ii) The sequence (0.01-0.2" + 0.2-0.01") is Benford, whereas (0.1-0.02" + 0.02-0.1") 
is not Benford. 

(iii) The sequence (0.2" + (—0.2)") is not Benford, since all odd terms are zero, but 
(0.2" + (-0.2)" + 0.03") is Benford — although this does not follow directly 
from Lemma |2.4[ 

Notation. For every integer d > 1, the set of all row-stochastic matrices of size dx d 
is denoted by Vd- 

Now, let P £ Vd he the transition probability matrix of a Markov chain. All 
Markov chains (or their associated matrices P) considered in this work are assumed 
to be finite-state (with d > 1 states), irreducible and aperiodic. Let Xi, . . . , Xs, s < d, 
be the distinct (possibly non-real) eigenvalues of the stochastic matrix P, with corre- 
sponding spectrum cr{P) = {Ai, . . . , A^}, i.e., cr{P) is the set of all distinct eigenvalues. 
Accordingly, the set cr{P)~^ = {A G o'(-P) : ^^mA > 0} forms the "upper half" of the 
spectrum. The usage of cr{P)^ refers to the fact that non-real eigenvalues of real 
matrices always occur in conjugate pairs, so the set cr^P)'^ only includes one of the 
conjugates. Without loss of generality, throughout this work it is also assumed that 
the eigenvalues in cr{P) are labeled such that 

|Ai| > IA2I > ... > |A,| . 

Furthermore, the column vectors ui, . . . ,Us and vi, . . . ,Vs denote associated sequences 
of left and right eigenvectors, respectively. The third main tool in this paper is 
the classical Perron- Frobenius theory of Markov chains, and the following lemma 
summarizes some of the special properties of transition matrices for ease of reference. 

Lemma 2.6. Suppose P £ Vd is irreducible and aperiodic. Then Ai = 1 > |A^| for 
all £ = 2, . . . , s, and there exists a P* £ Vd such that 

(i) lim„^ooP" = ^*; 
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(ii) for every n E N, 

pn_p* = X^C2 + ... + KC, , (3) 

(i i) 

where each Ci is a d x d-matrix whose components are polynomials in n 

with complex coefficients and degrees kf'''^ < d. 



Proof. Immediate from the Perron- Frobenius theorem, see e.g. [18j . 



□ 



The second dominant eigenvalue A2 plays an important role whenever 7^ 0. The 

analysis is especially straightforward if all eigenvalues are simple, i.e., if ^a{P) = d. 
In this case, for every n G N, 

P"_P* = ^J^^A^S, and P"+i-P" = j;'^^A,"(A,-l)i?, (4) 

holds with the d — 1 matrices = viuj / vj ui G C'^^'^. Next is the key definition in 
this paper. 

Definition 2.7. A Markov chain, or its associated transition probability matrix P, 
is Benford if each component of (P" — P*) and (p^+i — P") is either Benford or 
eventually zero. 

The following examples illustrate the notions of Benford and non-Benford Markov 
chains. 



Example 2.8. (Examples of Benford Markov chains) 



(i) Let d = 2 and P = 
n 3" 

pn _ p* _ ^■'^ 

7 



0.7 


0.3 


0.4 


0.6 


3 


-3 


-4 


4 



By [10, p. 432], P* 



4 3 
4 3 



and 



and P"+i-P" = 0.3" 



-0.3 0.3 
0.4 -0.4 



holds for all n G N. In both sequences every component is a multiple of (0.3"), 
and hence Benford by Lemma [2.4l since log 0.3 is irrational. The two-dimensional 
case will be discussed in more generality in Examples 13.51 and 14.21 



(ii) Let d = 3 and P 



. It is easy to check via spectral decom- 



0.9 0.0 0.1 
0.6 0.3 0.1 
0.1 0.0 0.9 

position (e.g. [6j) that the eigenvalues of P are Ai = 1, A2 = 0.8 and A3 = 0.3, 
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and P* 



0.5 0.5 
0.5 0.5 
0.5 0.5 



. The three eigenvalues are distinct, leading to 





0.5 





-0.5 


P" - P* = 0.8" 


0.5 





-0.5 




_ -0.5 





0.5 


well as 










" -0.1 





0.1 


pn+l _ pn _ g gn 


-0.1 





0.1 




0.1 





-0.1 



+ 0.3" 




-110 




+ 0.3" 




0.7 





-0.7 




As can be seen directly, in both cases the components (1, 2) and (3, 2) are zero 
for all n, whereas by Lemma 12.41 all other components follow BL. Hence, the 
Markov chain defined by the transition probability matrix P is Benford. 

As will be observed later, the moduli of the eigenvalues as well as a specific 
rational relationship between them play a crucial role in the analysis of BL in 
Markov chains, similar to the results in |2]. 



Example 2.9. (Examples of non-Benford Markov chains) 



(i) Let d = 2 and P = 
0.1" 

pn _ p* _ "-^ 



0.2 0.8 
0.1 0.9 



hence P* = — 



1 8 
1 8 



and, for every ri S N, 



and P 



n+l 



pn 



-0.8 
0.1 



0.8 
-0.1 



Since log 0.1 is rational. Lemma 12.41 implies that no component of (P" — P*) or 
(pn+i _ p") ig Benford. For example, -Di(|(P" - P*)^^''^^\) = 8 for ah n G N. 



0.0 0.1 0.9 
0.1 0.3 0.6 
0.1 0.1 0.8 

0.2 and A3 = —0.1. Since these three eigenvalues are distinct, again by 



(ii) Let d = 3 and P 
A2 



. The eigenvalues of P are Ai = 1, 



spectral decomposition, 
2" 

pn _ p* _ '-'•^ 

8 

as well as 

pn+l _ pn _ Q 2" 









0.1 
-0.7 
0.1 



+ 



(-0-1)^ 
11 



-0.1 
0.7 
-0.1 



+ (-o.ir 



10 
-1 
-1 



-1 
0.1 
0.1 



-10 
1 
1 



1 

-0.1 
-0.1 
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The first column of B2 is zero, hence for that column the relevant eigenvalue is 
A3 = —0.1. Since log 0.1 is rational, no component in the first column of either 
sequence (P" — P*) and — P") follows BL, i.e., P is not Benford. 

3 Sufficient condition that a Markov chain is Benford 

To analyze the behavior of the sequences (P" — P*) and — P"-) associated 

with a Markov chain, a nonresonance condition on P will be helpful. Recall that real 
numbers xi,...,Xk are rationally independent (or Q- independent) if Yl^=iQj^j — 
with qi, ■ ■ ■ ,qk G Q implies that qj = for all j = 1, . . . , k; otherwise , . . . , Xf^ are 
rationally dependent. 

Definition 3.1. A stochastic matrix P is nonresonant if every nonempty subset Aq = 
{Aii,...,AiJ C a{P)+\{Xi} with |A,J = ... = |A,,| = Lq satisfies #(AonM) < 1, 
and the numbers 1, logLo and the elements of ^argAo are rationally independent, 
where 

^argAo := arg Ai,, . . . , ^ arg A^J \ {O, i} . 

A Markov chain is nonresonant whenever its transition probability matrix is. A 
stochastic matrix or Markov chain is resonant if it is not nonresonant. 

Notice that for P to be nonresonant, it is required specifically that the logarithms of 
the moduli of all the eigenvalues other than Ai = 1 are irrational; in particular, P has 
to be invertible. Theorem lAl below establishes that nonresonance is sufficient for P to 
be Benford. There is a close correspondence between Definition 13.11 of a nonresonant 
matrix and the notion of a matrix not having 10-resonant spectrum, as introduced in 
[2]. The main difference is that the eigenvalue Ai = 1 is excluded in Definition 13.11 
whereas every stochastic matrix has lO-resonant spectrum. 

Example 3.2. (Examples of nonresonant matrices) 

(i) Both transition matrices in Example 12.81 are nonresonant. 



(ii) Let d = 5 and P 



0.0 0.25 0.25 0.25 0.25 

0.25 0.0 0.25 0.25 0.25 

0.25 0.25 0.0 0.25 0.25 

0.25 0.25 0.25 0.0 0.25 

0.25 0.25 0.25 0.25 0.0 



The eigenvalues of P 
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are Ai = 1 and A2 = —0.25 (with multiplicity four), so Aq = {—0.25}, with 
Lq = 0.25 and ^ arg Aq = 0. Since log 0.25 is irrational, P is nonresonant. 

Example 3.3. (Examples of resonant matrices) 

" 0.6 0.4 0.0 
0.8 0.0 0.2 
0.0 0.6 0.4 

The eigenvalues of P are Ai = 1 and A2,3 = ±-v/0.2. Notice that log|A2| = 
log IA3I = -i log 5 is irrational. With Aq = {Vo^, -Vo^} clearly #(AonM) = 2, 
hence P is resonant. The spectral decomposition (j4]) yields 



(i) Two real eigenvalues of opposite sign: Let d = 3 and P 



0.4 




= o.2A^^ + 0.2A^ : 
showing that P is not Benford either, 
(ii) Eigenvalues with rational logarithms: Let d = 3 and P 



if n is even, 
if n is odd, 



0.0 0.1 0.9 
0.5 0.1 0.4 
0.3 0.3 0.4 

The eigenvalues are Ai = 1 and A2,3 = — 0.25±0.05i-v/l5. Since log |A2,3| = —0.5 
is rational, the matrix P is resonant. 



(iii) Eigenvalues with rational argument: Let d = 3 and P 



0.3 0.3 0.4 
0.3 0.5 0.2 
0.1 0.7 0.2 
0.2i| = -1 + log 2 



The eigenvalues are Ai = 1 and A2,3 = ±0.2*. Note that log 
is irrational, but ^ arg(0.2i) = ;j is rational. Thus P is resonant. Spectral 



>(2,2) 



B. 



(2,2) 



decomposition gives 

(-pn _ p*)(2,2) ^ 1 ((0.2*)" + (-0.2i)") 



hence 



-l)"/2 .0.2'^ 



if n is even, 
if n is odd. 



which in turn shows that P is not Benford. 



(iv) Eigenvalues leading to rational dependencies within {l,logLo} U ^argAo: Let 

0.2 0.1 0.0 0.0 0.1 0.0 0.6 

0.1 0.1 0.1 0.1 0.2 0.0 0.4 

0.1 0.1 0.1 0.1 0.1 0.2 0.3 

d = 7 and P = 0.0 0.2 0.3 0.0 0.2 0.0 0.3 

0.1 0.2 0.1 0.1 0.0 0.1 0.4 

0.2 0.0 0.2 0.1 0.1 0.0 0.4 

0.1 0.2 0.2 0.0 0.0 0.0 0.5 



The characteristic poly- 
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nomial ipp oi P factors as 

,/;p(A) = (A - 1) (A^ + O.IA - 0.01) (A^ - 0.01(2 - i)) (A^ - 0.01(2 + i)) . 

The roots of the second factor are — ^ (l it \/5); the third factor has roots 

±^V2^ = ±^ (^^4 + 2^/5-i'\/-4 + 2V5^ , 

and the fourth factor has roots 

±^V2Ti = ±^ (^^4 + 2^5 + i\J-A + 2Vi^ . 

Thus, the dominated positive spectrum is 

a(P)+ \{Ai} = ^{-(^/5 + l), Vb-l, -2^/2^, 2^/2Tl} . 

Clearly, the logarithms of the absolute values of the two real eigenvalues are 
irrational. The four non-real eigenvalues all have the same modulus Lq = j^S^^^ 
(different from the two real eigenvalues), and logLo = — 1 + 5 log 5 is irrational. 
Let Aq = ^ {— \/2 — i-, \/2 + i}. Notice that arg(2 =F i) = =F arctan \, so 

^argAo = {\- ^ arctan i, ^ arctan i} =: {x3,X4}. 

Since 

-1 • 1 + • log Lo + 2 • X3 + 2 • X4 = , 
the elements of {l,logLo} U 2^Ao are Q-dependent, and hence P is resonant. 

The first main theoretical result of this paper is 

Theorem A. Every nonresonant irreducible and aperiodic finite-state Markov chain 
is Benford. 

The proof of Theorem [A] makes use of the following 

Lemma 3.4. Let m € N and assume that 1, po, pi, . . . , pm are Q-independent, (z„) 
is a convergent sequence in C, and at least one of the 2m numbers ci, . . . , C2m £ C is 
non-zero. Then, for every a £M, the sequence 

(n/9o + alogn + log|^„|) (5) 

is u.d. mod 1, where 

e„ := cie^™^! + C2e-2™^i + . . . + cad-ie^™^™ + ca^e-^™''" + 
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Proof. Follows directly as in the proof of [2^ Lemma 2.9] which considers log|lHe.^n| 
in dS}. □ 

Proof of Theorem\^ By Lemma [2.6f i). lim„_s.oo P" = P* exists for the Markov chain 
defined by P. Fix G {1, . . . , d}^. As the analysis of — P")(*'J'') is completely 

analogous, only (P" — P*)(*'J) will be considered here. If (P" — P* as given by 
(l3|) is not equal to zero for all but finitely many n, let Sij S {1, . . . , s} be the minimal 
index such that ci*^ '' ^ 0. As in |2l p. 224], to analyze distinguish two cases. 

Case 1: lA.. .1 > lA^,. .4_i|. 

In this case A^- . is a dominant eigenvalue, and it is real since otherwise its conjugate 
would be an eigenvalue with the same modulus. Equation ([3]) can be written as 



where 

c^f :=lim„^oon"'=^-;^'ci:f /O, 



and Cij{n) — as n — oo because Xg^ ^ is a dominating eigenvalue. Therefore, 
log I (P" - P* ) I = n log I A,, , I + k^^ log n + log I | + 7?„ , 



with 7/„ = log 



. Since r/„ — )• and log | As^ . | is irrational. 



the sequence (P" — P*)(*'-?) is Benford by Lemma [2. 2 1 and the fact that (a;„+alogn+/3) 
is u.d. mod 1 whenever (x„) is (e.g. |2j Lem. 2.8]). 

Case 2: \Xs,j \ = \>^Si^j+i\ = • • • = |At._J =: |Aij| for some tij > Sjj. 
Here several different eigenvalues of the same magnitude occur, such as e.g. conjugate 
pairs of non-real eigenvalues. Let k^^'^^ be the maximal degree of the polynomials 
c'f'^\ I = Sjj, . . . , tjj. As in Case 1, express ([3|) as 



(P" - P 
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where c-e'-" := lim„^oo 



£ c for ^ = Si J, tij, with ^'^ / for at least 



one i, and Cij{n) — )• as n — >• oo. Consequently. 



log - = nlog \Xij\ + A:^*'-') log 



n 



Write X£ as 

log|(P"-P*)(^'^') 



log 



Xij\e'^'>i^' for 



+ . . . + c, 



(.iJ) 



, tij, and hence 



n log I Xij I + k^^'^^ log n 



+ log 



{i,j)^margAs._^ _^ _ ^ _^ ^(i j)g«n arg At ■ 



+Ci,j{n) 



Since P is nonresonant, Lemma 13.41 applies with m = tij — Sij + 1 and po = log \ Xij\, 
pi = ^ arg A,,^^. ,...,pm = ^aig X^^^ . Thus (P" - P*f^^^ is Benford. □ 



Example 3.5. (The general two-dimensional case 
1 — X X 

y 1 - y 



Let d = 2 and P 



1 



y X 



with x,y G (0, 1). By Feller |TOl p. 432], 
(1 -X -yY 



+ 



x + y 



X —X 

-y y 



from which it is clear that Ai = 1, A2 = 1 — x — y, and P* 



1 



x + y 



y X 

y X 



(6) 



It 



follows from ([6]) that each component of (P" — P*) and (P"-+i — P") is a multiple 
of (A2). By Theorem lA] the Markov chain with transition probability matrix P is 
Benford whenever log \1 — x — y\ is irrational. On the other hand, by Lemma 12.41 P is 
not Benford if log |1 — x — y| € Q. Thus for d = 2, nonresonance is (not only sufficient 
but also) necessary for P to be Benford. For d >3, this is no longer true, see Example 
3.71 below. 



where xi,X2,yi,y2, zi, Z2 G (0,1) are 



Let d = 3 and P 



Example 3.6. (The general three-dimensional case) 

Xl X2 1 - Xl - X2 

yi y2 l-yi- y2 

Zi Z2 I - Zi - Z2 

such that Xl -|- X2, yi + y2,zi + Z2 all lie between and 1. Solving the characteristic 
equation yields the eigenvalues Ai = 1 and A2,3 = a ± Va"^ — b, with 



a= hixi+y2- zi- Z2) and b = xiy2 - X1Z2 + yiZ2 - X2yi + X2Z1 - y2Zi . 
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Furthermore, using 



c=l-y2 + zi- y2Zi + X2{-yi + zi) + xi(-l + y2 - ^2) + Z2 + yiZ2 ^ , 
one finds that 



P* = — 
c 



zi - 2/2^1 + yiZ2 X2Z1 + Z2- X1Z2 I- xi- X22/1 -y2 + Xiy2 

Zl - y2Zl + yiZ2 X2Z1 + Z2- XlZ2 1- Xi- X2yi - 2/2 + Xiy2 

zi - y2Zi + yiZ2 X2Z1 + Z2- X1Z2 I- xi- X2yi -y2 + Xiy2 



If / 6, then - p* = A^^a + Ag^a, where Be for ^ = 2, 3 are as in (gl). There 
are two cases to consider: 

(i) a^>b. 

In this case, Aa^a are real, and the dominant eigenvalue must be identified. If 
a > 0, then IA2I > IA3I, hence A2 is dominant. If 62'''^ 7^ for all G 
{1, 2, 3}^, then the Markov chain defined by P is Benford if log IA2I is irrational. 
In case there also exists (i, j) with b!^'''^ = yet B^''^^ 7^ 0, then for P to be 
Benford log IA3I has to be irrational as well. For a < the roles of A2 and A3 
have to be interchanged. If a = 0, then P is resonant but may still be Benford, 
see Example 13. 7f ii). 

(ii) < b. 

Here A2,3 are conjugate and non-real, with IA2I = IA3I = Vb. Thus P is 
nonresonant if and only if the numbers 1, ^ log b, ^ arctan -^/bjo? — 1 are Q- 
independent. 

Finally, if = 6 then A2 = A3 = a, so P is Benford whenever log \ a\ is irrational. 

The next example shows that for a Markov chain to be Benford, nonresonance is not 
necessary in general. 

Example 3.7. (Markov chains that are resonant yet Benford) 

" 0.4 0.5 0.1 " 

(i) Eigenvalues with rational argument: Let d = 3 and P = 0.4 0.3 0.3 

_ 0.6 0.1 0.3 

The eigenvalues are Ai = 1 and A2,3 = ±0.21 With Aq = {0.2i} therefore 
^ arg Ao = {j} C Q, so P is resonant. However, spectral decomposition shows 
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that B3 = B2, i.e., -62,-63 are conjugates, and each component of B2 has non- 
zero real and imaginary part. Thus for every G {1,2,3}^, 



(P" _ p*)(i,i)| = |2ine(0.2i)"sJ'-''^| 



2 • 0.2''\meB. 
2 • 0.2"bm-B, 



if n is even, 
if n is odd. 



and (P" - is Benford. 

" 0.4 0.5 0.1 

(ii) Two real eigenvalues of opposite sign: Let d = 3 and P = 0.7 0.2 0.1 

_ 0.4 0.2 0.4 

The eigenvalues are Ai = 1 and A2,3 = ±0.3. It can be checked that each 
component of B2 ± -B3 is non-zero. Thus for every G {1, 2, 3}^, 



(p" _ p*yi,j) ^ J'^^J) ^ (_l)npg 

which is Benford because log 0.3 Q. 



Remarks on general Markov chains: 

(i) Theorem |A] can not be applied to Markov chains that fail to be irreducible. 
However, every finite-state Markov chain can be decomposed into classes of recurrent 
and transient states. Hence, the transition matrix P can be block-partitioned as 



Pi 






P2 



















P(l) B(2) 



Pr 

-BW a 



where Pi, P2, ■ ■ ■ , Pr are the transition matrices of the r disjoint recurrent classes, and 
5(1)^ . . . , denote the transition probabilities from the collection of transient 
states into each recurrent class. As n — t- 00, 



pn 



pn 




r(l) 







r(2) 










pn 
rir) 








A" 









p* 
^2 












SB^P* SB^^^P^ 



P* 

^pWpr 



where Lli> = J^'lZ^ A^B^^^P]'^^^^ for j = 1, 2, . . . , r, and 5 = Y.V=o ■ Theorem El 
can be applied separately to the transition matrices Pj associated with the recurrent 
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classes. Consequently, if Pi, P2, • • • , -Pr are Benford, then the corresponding compo- 
nents of P are also Benford. Additionally, if A is nonresonant, then that part follows 
BL as well. The only remaining parts are formed by the sequences (L^f ■*) and depend 
on the (nonautonomous) summation of the powers of A. Their Benford properties are 
beyond the scope of this paper. 

(ii) For an irreducible Markov chain that is not aperiodic, but rather periodic 
with period p > 1, Definition 12.71 still makes sense, provided that P* is understood as 
the unique row-stochastic matrix with P*P = P*. However, such a chain cannot be 
Benford since for every G {1, . . . , d}^ one can choose k £ {0, . . . ,p — 1} such that 

|(pn _ = (p*)(iJ) > 0, Vn e N\{k + pN) . 

Similarly, each component of (P"+^ — P") equals zero at least {p — 2)/p of the time 
and thus cannot be Benford either whenever p > 3. The distribution of significands of 
^pn+i _ pn^ij) observed in this situation is a convex combination of BL and a pure 
point mass, see [5l Cor. 6]. Only in the case p = 2 is it possible for each component 
of (P"+^ — P") to be either Benford or eventually zero. 

(iii) Although this paper deals with finite-state Markov chains only, it is worth 
noting that chains with infinitely many states may also obey BL in one way or the 
other. For a very simple example, let < p < 1 and consider the homogeneous 
random walk on Z with 

' p2 if j = i - 1 , 

pihj) = < 

otherwise . 



2p{l-p) ifj = i, 
{1-pf ifj = ^ + l, 



Clearly, this Markov chain is irreducible and aperiodic. It is (null-)recurrent if p = ^, 
and transient otherwise. For all {i,j) G Z^ and n G N, 

and an application of Stirling's formula shows that (P" is Benford if and only 
if log (4/9(1 — />)) is irrational. For all but countably many p, therefore, (P")(*J) is 
Benford for every Note that one of the excluded values is p = ^, i.e. the 

recurrent case. For recurrent chains virtually every imaginable behavior of significant 
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digits or significands can be manufactured by means of advanced ergodic theory tools, 
see [3] and the references therein. 



4 Almost all Markov chains are Benford 

The second main theoretical objective of this paper is to show that Benford behavior 
is typical in finite-state Markov chains. Indeed, if the transition probabilities of the 
chain are chosen at random, independently and in any continuous manner, then the 
chain almost always, i.e. with probability one, obeys BL. To formulate this more 
precisely, the following terminology will be used. 

Definition 4.1. A random (d-state) Markov chain is a random d x d-matrix P, 
defined on some probability space P) and taking values in Vd, i.e., each row 

Xi , . . . , Xd of P is a random vector taking values in the standard d-simplex 



A random vector X : 17 — )• is continuous if its distribution on Ad is continuous 
w.r.t. the (normalised) Lebesgue measure on A^, that is, if F{X G ^4) = whenever 
A C Ad is a nullset. 

With this terminology, it is the purpose of the present section to illustrate and prove 

Theorem B. If the transition probabilities (i.e. the rows) of a random Markov chain 
P are independent and continuous, then P is Benford with probability one. 

Before giving a full proof for Theorem [Bl the special case of a random two-state 
chain will be examined to show how independence and continuity together allow 
the application of Theorem [Al The case d = 2 is especially transparent since the 
eigenvalue functions are simple and explicit, unlike for the general case where the 
eigenvalues are only known implicitly, and the Implicit Function Theorem has to be 
resorted to. 

Example 4.2. Consider the random two-state Markov chain 



Ad-. 



(xi, . . . , Xd) G M'' : Xj > for all I < j < d, and ^ Xj = 1 



P = 



1 - X 
Y 



X 

1 - Y 
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where the random variables X and Y are i.i.d. (absolutely) continuous random vari- 
ables on the unit interval [0,1]. Since X and Y are continuous, each of the four 
entries of P is strictly positive with probability one, so the chain is irreducible and 
aperiodic with probability one. Since P is random, the second-largest eigenvalue is a 
random variable Z which, by Example 13. 5^ satisfies Z = 1 — X — Y . Since X and 
Y are independent and continuous, Z is also continuous, and hence the probability 
that Z is in any given countable set is zero. But this implies that the probability 
of log \ Z\ being rational is zero, which in turn shows that with probability one, P is 
nonresonant, and hence Benford, by Theorem Rl 

Similarly to the analysis of Newton's method in [3], a key property in the present 
Markov chain setting is the real-analyticity of certain functions, notably the eigenvalue 
functions. Recall that a function f : U — )• C is real-analytic whenever it can, in the 
neighborhood of every point in its domain U (an open subset of for some ^ > 1), 
be written as a convergent power series. Clearly, every real-analytic function is C°°, 
i.e. has derivatives of all orders. An important property of real-analytic functions not 
shared by arbitrary C-valued C°°-functions defined on U is that the set {x £ U : 
f{x) = 0} is a nullset unless / vanishes identically on U . 

The proof of Theorem iBl will be based on several preliminary results. First, given 
a = (ai, . . . , ad) G C^, let : C — )• C denote the polynomial 

Va{z) = z'^ + aiz'^~'^ + ... + aa-iz + ad ■ 

By the Fundamental Theorem of Algebra, pa has exactly d zeroes (counted with 
multiplicities). If and p^, or more generally, if and ph with a^h have a common 
zero then a universal polynomial relation must necessarily be satisfied by a and h. 
Only a special case of this elementary fact is required here, and since no reference is 
known to the authors, a proof is included for completeness. 

Lemma 4.3. For every integer d > 1, there exists a non-trivial polynomial Qd in 
2d — 1 variables with the following property: Whenever a = (ai, . . . ,ad) € C^, b = 
{bi, . . . ,bd-i) € C^^^, and pa{zQ) = Ph{zo) = for some zq € C, then Qd{a,b) := 
Qd(ai, . . . ,ad,6i, . . .,bd-i) = 0. 

Proof. For d = 2, let Q2{cl, b) := aibi—a2 — bf for all a = (ai, 02) G and b = bi £ C 
To see that Q2 has the desired property, note that if Pa{zo) = = Pb{zo), then 
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ZQ+aiZQ + a2 = and zq = —bi, hence (52(0, b) = 0. Assume now that Qd has already 
been constructed. For every a G C^^^ and 6 G let p = 02 — 62 — (oi — &i)^i G C, 
as well as 

c = (03 - 63 - (ai - 61)62, . . . , ad - 6d - (ai - 61)6^, aa+i - (ai - 61)6^) £ C'^~\ 
and define 

V P 

where deg (YIj CjX^^'^ X2^'^ . . . x^^'^^ := max{nij + . . . + ngj : Cj 7^ 0}. Clearly, Qd+i 
is a polynomial in 2d + 1 variables, and Qd+i 7^ 0. If Pa{zo) = Pb{zo) = for some 
zq G C, then 

= Paizo) - {zQ + (ai - bi))pbizo) 

= Z^j^^ ~ ^i+i ~ («i ~ ^0 + (^d+i - (fli - 6i)6d . 

If /9 = 0, then clearly = 0. Otherwise, it is easy to check that ([7[) implies 

Pc/pizo) = 0, in which case Qd{b,c/p) = 0, by assumption. In either case, therefore, 
Qd+i{a,b) = 0. □ 

Corollary 4.4. For every integer d > 1, there exists a non-trivial polynomial in 
d variables such that Q*d{a) = whenever pa{zo) = p'a{zo) = for some zq £ C. 

Proof. Take = Qdia, b) with b = (^oi, ^02, • • • , ia<i-2, 30^-1)- ^ 

This corollary will now be used to show that if a stochastic matrix Pq is invert- 
ible and has distinct non-zero eigenvalues, then all stochastic matrices P sufficiently 
close to Pq also are invertible and have distinct non-zero eigenvalues. In fact, these 
eigenvalues are real-analytic functions of P. To formulate this efficiently, for every 
-Pq G 'Pd and e > denote by Pe(Po) the open ball with radius e centered at Pq, i.e. 
P,(Po) = {p G : - < £ for ah l<i,3<d]. 

Lemma 4.5. Suppose Pq G Vd is invertible and has d distinct non-zero eigenvalues. 
Then there exists e > and and d—1 non- constant real- analytic functions A2, . . . , : 
Pe(Po) C such that, for every P G Pe(Po), 

(i) 1, A2(P), . . . , Ad(P) are the eigenvalues of P, and A2(P) • . . . • Ad(P) / 0; 
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(ii) Xi{P) 7^ whenever i ^ j, unless Xi = Xj on Bs{Po)- 

Proof. Note first that by the continuity of {P,z) i— )• det{zldxd — P) = '^p{z)i there 
exists 6 > Q such that every P £ Bs{Pq) is invertible and has distinct non-zero 
eigenvalues. Thus the characteristic polynomial ipp of P has d — 1 distinct non- 
zero roots different from 1. Let zq be one of those roots. Since zq is a simple root, 
iP'p^(zq) 7^ 0, so by the Implicit Function Theorem [151 Theorem 2.3.5], zq depends 
real-analytically on the coefficients of ipp which themselves are real-analytic (in fact 
polynomial) functions of the entries of P. More formally, there exists e < 5 and 
a real-analytic function g : Bi;(Pq) — t- C with g{Po) = zq such that ipp(^g{P)^ = 
for all P G Bi;(Pq). Overall, there exists e > and d — 1 real-analytic functions 
Xi : Bs{Pq) — C satisfying (i); note that Ai = 1 by Lemma [2^ To see that A2, . . . , 
are not constant on B^iPo), suppose by way of contradiction that Xi{P) = Xi{Po) 7^ 1 
for some 2 < i < d and all P G Bi;{Pq). In this case, the real-analytic function P 1— t- 
Tpp[Xi{Po)) vanishes identically on B^{Pq), and hence on all of Vd- Since Idy^d £ 'Pd, 
this obviously contradicts ^/^/^^^ (Aj(i-b)) = (Aj(-Po) — 1)"^ 7^ 0- Consequently, none of 
the functions A2, . . . , A^ : i?e(Po) — C is constant. 

To show (ii), assume that Aj(Pi) = Xj{Pi) for some i ^ j and Pi € Bi,{Pq). Thus 
Aj(Pi) G C\M, since if Xi{Pi) were real, then Aj(Pi) = Xj{Pi), which is impossible since 
the eigenvalues are distinct. Since all matrices in Vd are real, their non-real eigenvalues 
occur in conjugate pairs. Hence, for all P sufficiently close to Pi, the number Xj{P) 
is an eigenvalue of P which, by continuity, can only be Aj(P). Consequently, Xi and 
Xj coincide locally near Pi and therefore, by real-analyticity, on all of Bf,{PQ). □ 

By means of the above auxiliary results, several almost sure properties of random 
Markov chains can be identified. 

Lemma 4.6. // the rows of the random Markov chain P are independent and con- 
tinuous then, with probability one, 

(i) P is irreducible, aperiodic, and invertible; 

(ii) P has d distinct non-zero eigenvalues; and 

(iii) P is nonresonant. 
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Proof. Fix P and assume its rows Xi, . . . , are independent and continuous. 

(i) Since each Xi is continuous, F{Xi £ A) = for every Lebesgue nullset A C A^, 
so in particular P(Xj j G {0, 1}) = for all i and j. With probability one, therefore, 
pi^J) £ (0, 1) for all i and j, and P is irreducible and aperiodic. To see that P is 
almost surely invertible, note that P i— )• det P is a non-constant, real-analytic function 
on Vd- With N = {{xi,...,Xd) G x . . . x : det{xi, . . . ,Xd) = O}, 

P(detP = 0)= / dF{xi,...,Xd) = [■ ■ ■ [ dP(xi) . . . dP(xrf) 

Jn j Jn 

= j---j (^dP(xi)^ dP(x2)...dP(xd) =0, 

where the second equality follows from the independence of Xi, . . . ^Xd, the third 
from Fubini's theorem, and the fourth from the continuity of the X^. 

(ii) There exist d non-constant polynomial functions gi, . . . , : "Pf^ — )• M such that 

^P{z) = det (z/dxd -P)=z^ + qi{P)z'^-^ + ... + qd-i{P)z + qd{P) 

holds for all P G P^ and z e C; for example, qi{P) = - E^li and qd{P) = 
{-lYdeiP. Consequently, q{P) := . . .,qd{P)) defines a non-constant real- 

analytic (in fact, polynomial) map g : "P^ — )• M, and since zq is a multiple eigenvalue 
of P if and only if ipplzo) = ^'p{zo) = 0, Corollary 14.41 implies that 

{P £ Vd ■ P has multiple eigenvalues ] C {P € Vd ■ q{P) = 0} . 

As before, by Fubini's Theorem ¥{q{P) = 0) = 0, showing that with probability one 
all eigenvalues of P are simple. 

(iii) For every p G Q define the real-analytic auxiliary function <l>p : — )■ M by 
<^>p{x) := (xl + xl- lO^Pf, and also 9 : M'' ^ M as Q{x) := (xf + xj - x^ - xjf . 
By (i) and (ii), P almost surely satisfies the hypotheses of Lemma H31 so let Pq, e, 
and A2, . . . , be as in Lemma 14.51 and define real-analytic functions <I>p_j and Qij 
on Bs{Po) as 

«>p,i(P) := <^>p{d\tXi{P),3m\iiP)) = (|Ai(P)|2 - 10^^)^ , yi:2<i<d, 
and, for all 2 < i, j < d, 

QijiP) := e(^ReA,(P),amA,(P),lHeAj(P), JmAj(P)) = (|A,(P)|2 - \Xj{P)\^)\ 
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Finally, let Fp : Se(Po) ^ K be defined as 

The definition of Fp becomes transparent upon noticing that Fp{P) = for some 
/9 G Q whenever P is invertible and resonant. Next, it will be shown that Fp does 
not vanish identically on Bi,[Pq). To see this, note first that if P G B^{Pq), then also 
(1 - 6)P + 6Idxd G Bs{Po) for ah sufficiently small 5 > 0. Moreover, if ^>p,i(P) = 
for some i = 2, . . . ,d, then 

- 6)P + 6Idxd) = (((1 - d)DltXi{P) + 6f + (1 - 6f3mXi{P)^ - 10^'')' 

= 5^(^(2 - S) {d\cXi{P) - \Xi{P)f) + S{{1 - mcXi{p))y > , 

provided that (5 > is small enough. (Recall that 1 — 9^eAj(P) > whenever P G 
Be{Po).) Similarly, if QijiP) = for some 2<i <j <d with Xi^J] and Xi{P) / 0, 
then a short calculation confirms that, for all 5 > sufficiently small, 

e.,((i - .)P . ^ .=(1 - ,). IMP)-A,(P)|^|x.(P)-W ^ „ 

Overall, Fp does not vanish identically on i?e(Po)- As every P G ^^(Po) is invertible, 

{P G P,(Po) : P is resonant } C [j^^^{P G ^.(^o) : Fp{P) = O} . 

Since Fp is real-analytic and non-constant, {P G Pe(Po) : Fp{P) = O} is a nullset for 
every p G Q, and so is Up6q{-P G Pe(Po) : Fp{P) = O}. Analogously to (i) and (ii), 
therefore, P (P is resonant ) = 0. □ 

Proof of Theorem O Let Xi , . . . , denote the random transition probabilities (row 
vectors) of the random d-matrix P. If Xi, . . . , X^ are independent and continuous, 
then by Lemma 14.61 P is almost surely irreducible, aperiodic, and nonresonant. By 
Theorem El this implies that P is Benford with probability one. □ 

Remark 4.7. (i) It is clear that without independence, or without continuity. Lemma 
14.61 and Theorem [B] are generally false. For example, for the conclusion of Lemma 
14.61 to hold it is not enough to assume that the distribution on of each row of P 
is atomless. As very simple examples show, under this weaker assumption, P may, 
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with positive probability, be reducible and have multiple or zero eigenvalues. Even if 
Lemma 14.61 (i.ii) hold with probability one, P may still be resonant and not Benford. 
To see this, consider the random three-state Markov chain 

X + 4 X 36 - 2X " 
Y Y + 4 36 - 21" , 
Z + 2 Z + 2 36 - 2Z_ 

where X ,Y, Z are independent and uniformly distributed on [0, 1]. The eigenvalues 
of P are 

Ai = l, As = 0.1, X3 = ^{X + Y-2Z). 

Note that IA3I < 0.05 < As- Clearly, P is resonant with probability one, and Lemma 
I4.6r iii) fails. Perhaps even more importantly. Theorem IB] fails as well since, as spectral 
decomposition shows, B2 ^ with probability one and hence ¥{P is Benford) = 0. 

(ii) With hardly any effort, the tools employed in the proof of Lemmas 14.51 and 
14.61 also yield a topological analogue of Theorem |B) Within the compact metric space 
Vd, the matrices that are irreducible, aperiodic, invertible and nonresonant form a 
residual set, that is, a set whose complement is the countable union of nowhere dense 
sets. Being Benford, therefore, is a typical property for P £ Vd not only under a 
probabilistic perspective but under a topological perspective as well. 

5 Simulations 

In this section, numerical simulations will illustrate the theoretical results of previous 
sections, and based on these simulations the rate of convergence towards BL will 
be discussed. Since it is not possible to observe the empirical frequencies of infinite 
sequences, (P"— P*) and — P") are simulated up to a predefined value of n, such 

as n = 1000 or n = 10000, and the empirical distributions of first significant digits of 
each component are compared to the Benford probabilities. For some Markov chains, 
simulations up to n = 1000 yield empirical frequencies very close to BL, whereas for 
others even n = 10000 does not give a good approximation, although theoretically 
all chains considered here follow BL. Thus, convergence rates towards BL may differ 
significantly. 
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Example 5.1. 

From Table dl it is clear that the sequences (2"), (n!), give different empirical 
frequencies for the simulation up to n = 1000. Compared to the other two, (i^n) gives 
empirical frequencies much closer to BL. 

Similarly, rates of convergence can be discussed for Markov chains. The important 
question is what property is creating the difference in convergence rates. Theorem [B] 
shows that every homogeneous Markov chain chosen independently and continuously 
is Benford with probability one. Besides irreducibility and aperiodicity, nonresonance 
is crucial. Irreducibility and aperiodicity do not determine the rate of convergence. 
This leaves nonresonance as the only source for different rates of convergence. Ac- 
cording to Definition 13.11 nonresonance is based on the rational independence of 1, 
logLo and the elements of g^argAo, provided that Aq 7^ 0. Thus, it is natural to 
expect this rational independence to be reflected in some quantitative manner in the 
rate of convergence towards BL. 

It is well known that there are infinitely many rational approximations for a given 
accuracy to any irrational number. Let x be an irrational number. Given any e > 0, 
there exist infinitely many pairs (p, g) G Z x N with gcd {p, q) = 1 and 

P ^ 
X < e . 

Q 

One way to obtain rational approximations of irrational numbers is provided by the 
method of continued fractions. Every irrational real number x is represented uniquely 
by its continued fraction expansion 

1 

x = ao H J , 

ai H J 

02 H ; 

as H 

also denoted as x = [oq; ai, 02, 03, . . .], where ao £ Z and a„ G N for n > 1 are referred 
to as the partial quotients of x. By [11, Theorem 149], if pn and qn are defined 
iteratively as 

P0 = ao, Pl= OiOo + 1 , Pn = CLnPn-l + Pn~2 , Vn > 2 , 

go = 1 , gi = Ol , qn = anqn^i + qn-2 , Vn > 2 , 
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then, for all n G N, 



— = ao H =: [ao;ai, . . . ,a„J 



qn 



ai + 



1 

a2 H 



1 

+ — 

an 



the rational numbers Pn/qn are called the convergents of the continued fraction of x. 
Leaving aside trivial exceptions, best rational approximations to an irrational x are 
of the form Pn/qn, and 



Pn 

qn 



< — ^— , Vn > 2. (8) 

^n+iqn 



It is clear from ([5]) that Pn/qn yields a particularly good approximation of x when 
ttn+i is large. Hence x can be rapidly approximated if its continued fraction expansion 
contains a sequence of rapidly increasing partial Quotients. On the other hand, if (fln) 
does not grow fast (or at all), then it is difficult to approximate a; by a rational 
number with small error, see [T6] for details. For example, [16^ Ch. 2, Theorem 
3.4] asserts that if (a„) is bounded for some x then the distribution mod 1 of (nx) 
approaches the uniform distribution rather quickly. Thus irrationals which are hard 
to approximate by rational numbers, due to a small upper bound on, or slow growth 
of (a„), are also the ones for which one expects to see fast convergence to Benford 
probabilities. Specifically, for the golden ratio ^-^y-^ = [1; 1, 1, 1, . . .], every a„ has 
the smallest possible value. Since I logFn - nlogi^l ^ as n — 7- c«, this may 
explain why the convergence to BL is faster for the Fibonacci sequence than for the 
other two sequences in Example 1 5. 11 (See [T7] for further insights on BL for continued 
fractions.) 

It is important to note that (a„) is unbounded for almost every x, [11, Theorem 
196]. Hence, in most simulations it is not possible to observe convergence as fast as for 
the Fibonacci sequence. However, to highlight the difference in rates of convergence 
and irrationality, two examples are studied. The first 50 partial quotients are given 
for every relevant irrational number that arises. 



Example 5.2. (Markov chain showing fast convergence) 

0.25 0.35 0.40 
0.30 0.45 0.25 
0.65 0.15 0.20 



Let d = 3 and P 



The eigenvalues of P are Ai = 1 and 
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^2,3 = -21) T ' hence a{P)+ \ {Ai} = {-^ - ^^/2T, + ^^/2T}. Since 

log|A2| and logjAsI are irrational and different, P is nonresonant. Thus Theorem Rl 
imphes that the Markov chain defined by P is Benford. 

Table [2] shows the empirical frequencies of significant digits for the first 1000 and 
10000 terms of (P" — P*), respectively; the behavior of — P") is very similar. 
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Table 2: Comparing empirical frequencies for the first significant digits with Ben- 
ford probabilities for the first 1000 (top half) and 10000 (bottom half) terms of the 
sequences (P" — P*)(*J), where P is the transition probability matrix in Example \5.S\ . 



Since IA2I > IA3I, all that matters is how well 

loglAsI = [-1;2, 4, 8, 1,5, 1,6, 3, 1,2, 2, 1,1, 2, 1,1, 2, 1,66, 5, 1,1, 2, 1,3, 

1,2,1,1,3,1,3,2,3,2,7,3,86,1,1,1,1,1,26,3,1,5,3,1,5,...] 

is approximated by rational numbers. From the above, o„ < 86 for all 1 < n < 50, 
and a rapid increase of quotients is not observed. This continued fraction expansion 
should be compared to the ones in the example below. 
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Let d = 3 and P 



with eigenvalues Ai = 1 and A2,3 = ^±^\/3 z. 



Example 5.3. (Markov chain showing slow convergence) 
" 0.8 0.1 0.1 " 
0.3 0.3 0.4 
0.4 0.0 0.6 

Thus \ {Ai} = + ^\/3 i} =: Aq, and the behavior of significant digits is 

governed by the two irrational numbers 

loglAsI = [-1;1, 1,3, 1,7, 1,15, 1,2, 1,1, 7, 1,6, 2, 1,3, 1,1, 2,4, 1,1, 2, 3, 

8,1,2,1,1,2,1,2,1,7,1,1,2,1,33,1,2,1,2,1,1,11,1,24,8,...], 

^ arg As = [0;25, 1, 9, 3, 168, 2, 1, 1, 32, 1, 6, 3, 1, 9, 1, 1, 92, 2, 13, 2, 1, 1, 10, 2, 5, 
1,3,1,1,1,1,3,1,2,7,1,5,1,1,4,1,3,14,3,10,1,1,3,1,3,...]. 

Note that max^'^;^ a„ = 33 for log IA2I, whereas max^^^^ a„ = 168 for ^ arg As- When 
compared with Example 15.21 the repeated early high values in the continued fraction 
expansion of ^argAs suggest a somewhat slower convergence to BL. As shown in 
Table [3l this slower convergence is clearly recognizable in simulations of (P" — P*); 
again the behavior of — P") is very similar. 



6 Applications 

In scientific calculations using digital computers and floating point arithmetic, round- 
off errors are inevitable, and as Knuth points out in his classic text The Art of Com- 
puter Programming |14l pp. 253-255], 

In order to analyze the average behavior of floating-point arithmetic al- 
gorithms (and in particular to determine their average running time), we 
need some statistical information that allows us to determine how often 
various cases arise . . . [If, for example, the] leading digits tend to be small 
[that] makes the most obvious techniques of average error estimation for 
floating-point calculations invalid. The relative error due to rounding is 
usually . . . more than expected. 

Thus for the problem of numerical estimation of P* from P", it is important 
to study the distribution of significant digits (or, equivalently, the fraction parts of 
floating-point numbers) of the components of (P" — P*) and (P"-+i — P"). 
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Table 3: Comparing empirical frequencies for the first significant digits with Ben- 
ford probabilities for the first 1000 (top half) and 10000 (bottom half) terms of the 
sequences (P" — where P is the transition probability matrix in Example \5.3l 



Theorem [B] above shows that the components of both (P" — P*) and (P"+i — P") 
typically exhibit exactly the type of nonuniformity of significant digits alluded to by 
Knuth: Not only do the first few significant digits of the differences between the com- 
ponents of the successive n-step transition matrices P" and the limiting distribution 
P*, as well as the differences between P"+^ and P" tend to be small but, much more 
specifically, they typically follow BL. 

This prevalence of BL has important practical implications for estimating P* 
from P" using floating-point arithmetic. One type of error in scientific calculations 
is overflow (or underflow), which occurs when the running calculations exceed the 
largest (or smallest, in absolute value) floating-point number allowed by the computer. 
Feldstein and Turner show that [HI p. 241], "[u]nder the assumption of the logarithmic 
distribution of numbers (i.e., BL) floating-point addition and subtraction can result 
in overflow and underflow with alarming frequency . . . " . Together with Theorem [B], 
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this suggests that special attention should be given to overflow and underflow errors 
in any computer algorithm used to estimate P* from P". 

Another important type of error in scientific computing is due to roundoff. In es- 
timating P* from P", for example, every stopping rule, such as "stop when n=1000" 
or "stop when the components in (P'^+i —P^^ are less than 10^^*^", will result in some 
error, and Theorem iBl shows that this difference is generally Benford. In fact, justified 
by heuristics and by the extensive empirical evidence of BL in other numerical calcu- 
lations, analysis of roundoff errors has often been carried out under the hypothesis of a 
logarithmic statistical distribution (cf. [9, p. 326]). Therefore, as Knuth pointed out, 
a naive assumption of uniformly distributed significands in the calculations tends to 
underestimate the average relative roundoff error in cases where the actual statistical 
distribution of fraction parts is skewed toward smaller leading significant digits, as is 
the case in BL. To obtain a rough idea of the magnitude of this underestimate when 
the true statistical distribution is BL, let X denote the absolute roundoff error at the 
time of stopping the algorithm, and let Y denote the fraction part of the approxima- 
tion at the time of stopping. Then the relative error is X/Y, and assuming that X 
and Y are independent random variables, the average (i.e., expected) relative error 
is simply EX • K{l/Y). Thus if Y is assumed to be uniformly distributed on [1, 10), 
ignoring the fact that Y is Benford creates an average underestimation of the relative 
error by more than one third (cf. |3]). 

As one potential application of Theorems lAl and [Bl it should be possible to adapt 
the current plethora of BL-based goodness-of-fit statistical tests for detecting fraud 
(e.g. [7]), to the problem of detecting whether or not a sequence of realizations of a 
finite-state process originates from a Markov chain, i.e., whether or not the process 
is Markov. By Theorem [B1 conformance with BL for the differences — P" is 

typical in finite-state Markov chains, so a standard (e.g. chi-squared) goodness-of-fit 
to BL of the empirical estimates of the differences between P"-~^^ and P" may help 
detect non-Markov behavior. 
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